[QUARK-479] Fix per-layer quant dtype in DeepSeek attention init by thpereir · Pull Request #268 · ROCm/ATOM

thpereir · 2026-03-04T23:44:38Z

For mixed-precision models (e.g. MXFP4 MoE + FP8 attention), the attention block must resolve its own per-layer quant spec rather than using the global quant_config['quant_dtype'].

Add _attn_spec / _attn_quant_dtype via quant_config.resolve(prefix)
Use resolved dtype for FP4/FP8 decision in attention init
Pass prefix to MergedReplicatedLinear for fused_qkv_a_proj
Use resolved dtype for fuse_qknorm_quant decision

Tested with DeepSeek-R1-0528-moe-mxfp4-other-ptpc on TP=4.

Depends on #236

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

For mixed-precision models (e.g. MXFP4 MoE + FP8 attention), the attention block must resolve its own per-layer quant spec rather than using the global quant_config['quant_dtype']. - Add _attn_spec / _attn_quant_dtype via quant_config.resolve(prefix) - Use resolved dtype for FP4/FP8 decision in attention init - Pass prefix to MergedReplicatedLinear for fused_qkv_a_proj - Use resolved dtype for fuse_qknorm_quant decision Tested with DeepSeek-R1-0528-moe-mxfp4-other-ptpc on TP=4.

Instead of checking the global quant_dtype to decide whether to bypass FP4 quantization for MTP layers, use quant_config.resolve(prefix) to check the per-layer spec. This correctly preserves FP8 quantization for MTP layer 61 when the global config is MXFP4 but the layer has an FP8 per_Token override (as in the PTPC model format).

thpereir mentioned this pull request Mar 6, 2026

[QUARK-480] Per-layer quantization config support #236

Open

1 task

thpereir added 2 commits March 6, 2026 16:03

thpereir force-pushed the thpereir/quark_quant_layer branch from 028d617 to 7392aec Compare March 6, 2026 16:04

thpereir force-pushed the thpereir/deepseek_r1_mxfp4_ptpc branch from f51003c to 6045cef Compare March 6, 2026 16:05

thpereir mentioned this pull request Mar 6, 2026

Phase 2: replace dict-style quant_config access with LayerQuantSpec a… #274

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUARK-479] Fix per-layer quant dtype in DeepSeek attention init#268

[QUARK-479] Fix per-layer quant dtype in DeepSeek attention init#268
thpereir wants to merge 2 commits intothpereir/quark_quant_layerfrom
thpereir/deepseek_r1_mxfp4_ptpc

thpereir commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thpereir commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thpereir commented Mar 4, 2026 •

edited

Loading